Statistical mechanics of secondary structures formed by random RNA sequences 
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The formation of secondary structures by a random RNA sequence is studied as a model system 
for the sequence-structure problem omnipresent in biopolymers. Several toy energy models are 
introduced to allow detailed analytical and numerical studies. First, a two-replica calculation is 
performed. By mapping the two-replica problem to the denaturation of a single homogeneous RNA 
in 6-dimensional embedding space, we show that sequence disorder is perturbatively irrelevant, 
i.e., an RNA molecule with weak sequence disorder is in a molten phase where many secondary 
structures with comparable total energy coexist. A numerical study of various models at high 
temperature reproduces behaviors characteristic of the molten phase. On the other hand, a scaling 
argument based on the extremal statistics of rare regions can be constructed to show that the low 
temperature phase is unstable to sequence disorder. We performed a detailed numerical study of 
the low temperature phase using the droplet theory as a guide, and characterized the statistics 
of large-scale, low-energy excitations of the secondary structures from the ground state structure. 
We find the excitation energy to grow very slowly (i.e., logarithmically) with the length scale of 
the excitation, suggesting the existence of a marginal glass phase. The transition between the low 
temperature glass phase and the high temperature molten phase is also characterized numerically. 
It is revealed by a change in the coefficient of the logarithmic excitation energy, from being disorder 
dominated to entropy dominated. 

PACS numbers: 87.15.Aa, 05.40.-a, 87.15.Cc, 64.60.Fr 



I. INTRODUCTION 

RNA is an important biopolymer critical to all living 
systems |ll and may be the crucial entity in prebiotic 
evolution 0. Like for DNA, there are four types of nu- 
cleotides (or bases) A, C, G, and U which, when poly- 
merized can form double helical structures consisting of 
stacks of stable Watson-Crick pairs (A with U or G with 
G). However unlike a long polymer of DNA, which is of- 
ten accompanied by a complementary strand and forms 
otherwise featureless double helical structures, a polymer 
of RNA usually "operates" in the single-strand mode. It 
bends onto itself and forms elaborate spatial structures 
in order for bases located on different parts of the back- 
bone to pair with each other, similar conceptually to how 
the sequence of an amino acid encodes the structure of a 
protein. 

Understanding the encoding of structure from the pri- 
mary sequence has been an outstanding problem of the- 
oretical biophysics. Most theoretical work in the past 
decade have been focused on the problem of protein 
folding, which is very difficult analytically and numeri- 
cally P, 0, H, H]. Here, we study the problem of RNA fold- 
ing, specifically the formation of RNA secondary struc- 
tures. For RNA, the restriction to secondary structures 
is meaningful due to a separation of energy scales. It 
is this restriction that makes the RNA folding problem 
amenable to detailed analytical and numerical studies p| . 
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There exist efRcient algorithms to compute the exact par - 
tition function of RNA secondary structures |g, |^, |0|. 
Together with the availability of carefully measured free 
energy parameters |ll| describing the formation of vari- 
ous microscopic structures (e.g., stacks, loops, hairpins, 
etc.), the probable secondary structures formed by any 
given RNA molecule of up to a few thousand bases can 
be obtained readily. On the experimental side, RNA 
molecules of 10^ - 10^ bases in length are available. Fur- 
thermore, the restriction to secondary structures can be 
physically enforced in a salt solution with monovalent 
ions, e.g., Na , so that controlled experiments arc in 
principle possible ^% . 

In this work, we are not concerned with the structure 
formed by a specific sequence. Instead, we will study 
the statistics of secondary structures formed by the en- 
semble of long random RNA sequences (of at least a 
few thousand bases in length in practice). Such knowl- 
edge may be of value in detecting important structural 
components in messenger RNAs which may otherwise be 
regarded as random from the structural perspective, in 
understanding how functional RNAs arise from random 
RNA sequences 0, or in characterizing the response of 
a long single-stranded DNA molecule to external pulling 
forces [|l3[ . More significantly from the theoretical point 
of view, the RNA secondary structure problem presents 
a rare tractable model of a random heteropolymer where 
concrete progress can be made regarding the thermody- 
namic properties 0, |lj, |l^, |l6|, 0, |8| Q . Nevertheless, 
there are many gaps in our understanding. This paper 
is a detailed report of our on-going effort in this regard. 
It provides a self-contained introduction of the random 
RNA problem to statistical physicists as a novel problem 



of disordered systems, and depicts several approaches we 
have tried to characterize this system. 

The manuscript is organized as follows: In Sec. II, we 
provide a detailed introduction to the phenomenology of 
RNA secondary structure formation. We review the key 
simplifications which form the basis of efficient comput- 
ing as well as exact solutions in some cases. We also 
review the properties of the "molten phase" , which is the 
simplest possible phase of the system assuming sequence 
disorder is not relevant. In Sec. Ill, we consider the ef- 
fect of sequence disorder at high temperatures. We show 
numerical evidence that the random RNA sequence is in 
the molten phase at sufficiently high temperatures, and 
support this conclusion by solving the two-replica sys- 
tem which can be regarded as a perturbative study on 
the stability of the molten phase. In Sec. |^, we pro- 
vide a scaling argument, and show why the molten phase 
should break down at low enough temperatures. This is 
followed by a detailed numerical study of the low temper- 
ature regime. We apply the droplet picture and charac- 
terize the statistics of large-scale, low-energy excitations 
of the secondary structures from the ground state struc- 
ture. Our results support the existence of a very weak 
(i.e., marginal) glass phase characterized by logarithmic 
excitation energies. Finally, we describe the intermediate 
temperature regime where the system makes the transi- 
tion from the glass phase to the molten phase. The so- 
lution of the two-replica problem is relegated to the ap- 
pendices. We present two approaches: In Appendix |A|, 
we provide a mapping of the two-replica problem to the 
denaturation of an effective single RNA in 6-dimensional 
embedding space; this approach highlights the connec- 
tion of the RNA problem to the self-consistent Hartree 
theory and should be most natural to field theorists. In 
Appendices O and O, we present the exact solution. It 
is hoped that the two-replica solution may be helpful in 
providing the intuition needed to tackle the full n-replica 
problem. 



II. REVIEW OF RNA SECONDARY 
STRUCTURE 

A. Model and definitions 



shown can be divided into stems of consecutive base pairs 
and loops which connect or terminate these stems. In 
naturally occurring RNA molecules, the stems typically 
comprise on the order of five base pairs. They locally 
form the same double helical structure as DNA molecules. 
However, while the latter typically occur in complemen- 
tary pairs and bind to each other, RNA molecules are 
mostly single-stranded and hence must fold back onto 
themselves in order to gain some base pairings. 




FIG. 1: Diagramatic representation of an RNA secondary 
structure: The solid line symbolizes the backbone of the 
molecule while the dashed lines stand for the hydrogen- 
bonded base pairs formed. The backbone is shaped such that 
stems of subsequent base pairs and the loops connecting or 
terminating them can be clearly seen. These stems form dou- 
ble helical structures similar to that of DNA. 




FIG. 2: Pseudo-knots in RNA structures: The base pairings 
indicated by the arrow in (a) create a pseudo-knot. We ex- 
clude such configurations in our definition of secondary struc- 
tures: The short pseudo-knots (called "kissing hairpins") as 
shown in (b) do not contribute much to the total binding en- 
ergy, and the long ones shown in (c) are kinetically forbidden 
since the double helical structure would require threading one 
end of the molecule through its loops many times. 



1. Secondary structures 

The secondary structure of an RNA describes the con- 
figuration of base pairings formed by the polymer. If the 
pairing of the i**^ and j'^ bases in a polymer of N total 
bases is denoted by (i, j) with 1 < i < j < N, then each 
secondary structure S is defined by a list of such pair- 
ings, with each position appearing at most once in the 
list, and with the pairs subject to a certain restriction to 
be described shortly below. Each such structure can be 
represented by a diagram as shown in Fig. nl where the 
solid line symbolizes the backbone of the molecule and 
the dashed lines stand for base pairings. The structure 



By a secondary structure, one often considers only the 
restricted set of base pairings where any two base pairs 
(i,j) and {k,l) in a given secondary structure are ei- 
ther independent, i.e., i < j < k < I, oi nested, i.e., 
i < k < I < j. This excludes the so-called pseudo- 
knots (as exemplified by Fig. g) and makes analytical 
and numerical studies much more tractable. For an RNA 
molecule, the exclusion of pseudo-knots is a reasonable 
approximation because the long pseudo-knots are kinet- 
ically difficult to form, and even the short ones occur 
infrequently in natural RNA structures ||l^. The lat- 
ter is due to their relatively low binding energies for 
short sequences and the strong electrostatic repulsion of 



the backbone — because the polymer backbone is highly 
charged and pseudo-knotted configurations increase the 
density of the molecule, their formation can be relatively 
disfavored in low salt solution. Similarly, the tertiary 
structures which involve additional interactions of paired 
bases are strongly dependent on electrostatic screening 
and can be "turned off' experimentally by using mono- 
valent salt solution |1^. Indeed, the pseudo-knots are 
often deemed part of the tertiary structure of an RNA 
molecule. Throughout this study, we will exclude pseudo- 
knots in our definition of secondary structures. Without 
the pseudo-knots, a secondary structure can alternatively 
be represented by a diagram of non-crossing arches or by 
a "mountain" diagram as shown in Fig. ^. 
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FIG. 3: Abstract representations of the RNA secondary struc- 
ture shown in Fig. hi: In (a) the solid line symbolizes the 
stretched out backbone of the molecule while the dashed 
arches stand for the base pairs formed. Due to the no pseudo- 
knot constraint two arches never cross, (b) shows an equiv- 
alent representation as a "mountain diagram". It is a line 
derived from the arch diagram by going along the backbone 
from left to right and going one step up for every beginning 
arch, horizontally for each unbound base, and one step down 
for each ending arch. Such a mountain always stays above 
the baseline and comes back to the baseline at base A^. 



2. Interaction energies 

In order to calculate Boltzmann factors within an en- 
semble of secondary structures, we need to assign an en- 
ergy E\S\ to each structure S. Each secondary structure 
can be decomposed into elementary pieces such as the 
stems of base pairs and the connecting loop regions as 
shown in Fig. yj. A common approach is to assume that 
the contributions from these structural elements to the 
total energy are independent of each other and additive. 

Within a stem of base pairs, the largest energy contri- 
bution is the stacking energy between two adjacent base 
pairs (G-C, A-U, or G-U), and the total energy of the 
stem is the sum of stacking energies over all adjacent 
base pairs. Since each secondary structure is defined as 
a single state in our ensemble, it is necessary to inte- 
grate out all other microscopic degrees of freedom of the 



bases within a given secondary structure and use an effec- 
tive energy parameter for each base stacking. The most 
convenient one to use is the Gibbs free energy of stack- 
ing |ll[] , which contains an enthalpic term due to base 
stacking, and an cntropic term due to the loss of single- 
stranded degrees of freedom (as well as the additional 
conformational change of the backbone and even the sur- 
rounding water molecules) due to base pairing. The mag- 
nitudes of these stacking free energies actually depend on 
the identity of all four bases forming the two base pairs 
bracketing the stack and are dependent on temperature 
themselves. While their typical values are on the order 
of k^T at room temperature, the enthalpic and entropic 
contributions are each on the order of lOkBT. Thus, upon 
moderately increasing the temperature from room tem- 
perature to about 80°C, the stacking free energies be- 
come repulsive and the RNA molecule denatures. 

The stacking free energies account for most but not 
all of the entropic terms for a given secondary structure. 
There is an additional (logarithmic) "loop energy" term 
associated with the entropy loss of each closed loop of 
single-stranded RNA formed by the secondary structure, 
as well as the energy necessary to bend the single strand. 
All of these energy parameters have been measured in 
great detail ||ll| . When incorporated into an efficient dy- 
namic programming algorithm (to be described below), 
they can rather successfully predict the secondary struc- 
tures of many RNA molecules of up to several hundred 
bases in length |^, ^, |T^ . 

In this paper, we investigate the statistical properties 
of long, random RNA sequences far below the denatura- 
tion temperature. We are interested in generic issues such 
as the existence of a glass phase and various scaling prop- 
erties. Guided by experiences with other disordered sys- 
tems [gO| , we believe these generic properties of the sys- 
tem should not depend on the specific choice of the model 
details. Since the full model used in Refs |8| g, |l^ makes 
analytical and numerical studies unnecessarily clumsy, 
we will examine a number of simplified models, while pre- 
serving the most essential feature of the system, namely, 
the pattern of matches and mismatches between different 
positions of the sequence. 

As in the realistic model described above, we choose 
our reference energy to be the unbound state, so that 
each unbound base in a secondary structure is assigned 
the energy 0. We will neglect the logarithmic loop en- 
ergy, which is important very close to the denaturation 
transition ||2^ where the average binding energy is close 
to zero, but not far below the denaturation temperature 
where most bases are paired. Moreover, we will radically 
simplify the energy rules for base pairing: We neglect the 
stacking energies and instead associate an interaction en- 
ergy £ij with every pairing {i,j). Thus, 



is the total energy of the structure S. 



(1) 



Within this model, it remains to be decided how to 
choose the energy parameters £i.j's. One possibihty is 
to choose each of the bases bi . . .h^ randomly from the 
'alphabet' set {A, C, G, U} and then assign 



£*,■ 



bi~bj is a Watson-Crick base pair 
otherwise 



(2) 



with UmjMmm > being the match or mismatch energy 
respectively. Here, the value of Umm is actually not essen- 
tial as long as it is repulsive, since the two bases always 
have the energetically preferred option to not bind at all. 
Thus the energetics of the system is set by Um ■ In our 



numerical study to be reported in Sees. Ill and [V, we 
will primarily use this model^ with u^ — w,nm = 1- We 
will refer to this as the "sequence disorder" model. 

For analytical calculations, it is preferable to treat all 
the Sij^s as independent identically distributed random 
variables, i.e., to assume 



l<i<j<N 



(3) 



for the joint distribution function /^[{ei.j}] of all the £ij's. 
This choice neglects the correlations between e,; ^ and Si^k 
which are generated through the shared base 6^; it is an 
additional approximation on the model (g). However, 
we do not anticipate universal quantities to depend on 
such subtle correlation of the £ij's. This will be tested 
numerically by comparing the behavior of the model (0) 
with that of the model defined by Eq. (||) together with 

1 3 

p(£) = -(5(e + M,„) + -(5(e--Uinm)- (4) 

This distribution is chosen to mimic the random sequence 
model (||) with a 4-letter alphabet, but it does not contain 
any correlation between the different Ei^w's. We will refer 
to this model as defined by Eqs. (g) and (Q) as the "energy 
disorder" model. 

In the actual analytical calculations, we will go even 
one step further and take the Sij to be Gaussian random 
variables specified by 



Pie) 



1 



V2^ 



,-{e~Wy/2D 



(5) 



where e is the average binding energy and D is the vari- 
ance. In this model (referred to below as "Gaussian disor- 
der" model,) the parameter D provides us with a conve- 
nient measure of the disorder strength. Again, universal 



^ Note that as a toy model, there is no reason why the alphabet 
size of the bases needs to be 4 (as long as it is larger than 2 
as explained below). Indeed the alphabet size and the choice of 
the matching rule can be used as tuning parameters to change 
the strength of sequence disorder. But in our study, we choose 
to minimize the number of parameters and tune the effective 
strength of disorder by changing temperature. 



quantities should not depend on the choice of the distri- 
bution functions. We will test this directly by performing 
numerical studies for these Gaussian random energies, 
with 



1 



and D 



16 



("m + Ununf (6) 



chosen to match the first two moments of the distribution 
Eq. I). 

In contrast to prior numerical studies p7| , we do not 
exclude base pairing between neighboring bases (i, i -t- 1), 
i.e., we do not set a minimal allowed length for the hair- 
pins^. Setting a constraint on the minimal hairpin length 
would make the analytical study much more cumber- 
some. However, in the study by Pagnani et al. p7[ , it 
has been argued that the system will not be frustrated 
(and hence will not form a glass) without this additional 
constraint. We believe this is an artifact of the 2-letter 
alphabet used by Pagnani et al. in order to generate 
the binding energy e^j's via a rule similar to Eq. (||): It 
is simple to see that for any 2-letter sequence in which 
the like letters repel and unlike letters attract, one can 
always find the minimal total binding energy by pair- 
ing up neighboring bases of opposite types and removing 
them from the sequence if no additional constraints such 
as the minimal hairpin lengt h are enforced. As we will 
discuss in detail in Sec. [V A this is not a problem if the 
alphabet size is larger than 2. Thus, in our study, we 
use the sequence disorder model with a 4-letter alpha- 
bet, or the energy disorder model, without enforcing the 
minimal hairpin length constraint. While the minimal 
hairpin length (of 3 bases) is known for real RNA fold- 
ing, it should not change the universal properties of long 
RNA sequences. 



3. Partition function 

Once the energy of each secondary structure is defined, 
we can study the partition function 



ses(jv) 



-PE[S] 



(7) 



of the molecule where S(A^) denotes the set of all al- 
lowed secondary structures of a polymer of N bases, and 
(3 = l/ksT. To calculate this partition function, it is 
useful to study the restricted partition function Zi_j of 
the substrand from position i to position j of the RNA 
molecule. Given the model (|^), the restricted partition 



^ We did however repeat most of the numerical studies presented 
in this paper with a minimal hairpin size of 1. Since the results 
are qualitatively identical to the results of the simpler model 
presented here, we do not show this data. 



functions can be split up according to the possible pair- 
ings of position j. This leads to the recursive equa- 
tion H |15|^ I2I 



structures in Sec. IV. In our numerical investigations, we 



Z,,. 



j-i 



^^J-i + Yl Z^^k-i ■ e-^'"-^ ■ Zk+ij-i (8) 



k 



with Z(N) — Zi^N being the total partition function of 
the molecule. In terms of the arch diagrams introduced 
in Fig. g(a) this can be represented as 



(9) 



where the wavy lines stand for the restricted partition 
functions. This is easily recognized as a Hartree equa- 
tion. Since the restricted partition functions on the right 
hand side of this equation all correspond to shorter pieces 
of the RNA molecule than the left hand side, this equa- 
tion allows one to calculate the exact partition function of 
an RNA molecule of length N with arbitrary interactions 
Si J in 0{N^) time. This is accomplished by starting with 
the partition functions for single bases and recursively 
applying Eq. (ra), and is known as a dynamic program- 
ming algorithm M, 03]. This algorithm allows one to 
compute numerically the partition function involving all 
secondary structures, for arbitrary RNA molecules of up 
to N ^ 10, 000 bases. It also forms the basis of analytical 
approaches to the problem as we will see shortly. 



4- Physical observables 

Apart from the partition function itself, we will use ad- 
ditional observables in order to characterize the behavior 
of RNA secondary structures. One such quantity of in- 
terest is the binding probability Pt.j, i.e., the probability 
that positions i and j are paired given the £ij's, 



R 



•^ Zi+ij-iZ. 



j+i.i-i 



Zi, 



N 



(10) 



where Zi+ij-i is given by the recursion equation (g) 
and Zj^i^i-i is the partition function of the se- 
quence bj+ibj+2 ■ ■ ■ ^jv&i • ■ ■ ^1-2^2-1- The latter can 
be calculated as the quantity Zjj^i_N+i-i when apply- 
ing the recursion Eq. (||) to the duplicated sequence 
61 . . . bNbi . . . bN- Thus, all N{N — l)/2 such constraint 
partition functions can be calculated with the same re- 
cursion in 0{N^) time. The logarithms 



^F..j = 



'kBTlnPij 



(11) 



of these binding probabilities have a natural interpreta- 
tion: they can be read as the "pinching free energies", 
i.e., as the free energy cost of a pinch between positions i 
and j and the unperturbed state. We will make extensive 
use of this concept of pinched structures in our discus- 
sion of the low temperature behavior of RNA secondary 



will choose as a representative of all the pinching energies 
for different positions by 

AF{N) = AFi^^r/2+i (12) 

which is the free energy cost of the largest possible pinch 
that splits the molecule of length N into two pieces of 
length N/2 - 1 each. 

Another quantity which describes a secondary struc- 
tures is its "size profile" . As an intrinsic measure of the 
size of a given secondary structure S, we use the "ladder 
distance" hi{S) between the base at position 1 and the 
base at position i, which is the the number of pairings 
(or ladders) one has to cross to go from a pair involving 
base 1 to the base i; see Fig. ^. It can be defined for each 
secondary structure 5" as the total number of pairings 
{k, k') £ S that bracket i, i.e., 

his) EE \{{k, k') £S\k<i< k'}\. (13) 

This quantity can be very easily visualized as the 
"height" at position i of the mountain representation of 
the secondary structure S as shown in Fig. ^(b) . A quan- 
tity characterizing the full ensemble of secondary struc- 
tures is the thermal average {hi) of this size profile over 
all secondary structures with their respective Boltzmann 
factors; it can be straightforwardly calculated from the 
probabilities Pk k' as 



{h^ 



i-l N 

EE 

k—l k'^i 



Pk, 



(14) 



Since we expect all positions in the sequence to behave 
in a similar way, in our numerics we will summarize the 
properties of the size profile by the ladder distance from 
the first to the middle base, i.e., we will study 



{h) = (/lAr/2+i 



(15) 



as a quantity representing the overall "size" of an ensem- 
ble of secondary structures. 



B. The molten phase 

1. Definition of the molten phase 

If sequence disorder does not play an important role, 
we may describe the RNA molecule by replacing all the 
binding energies e^.j by some effective value £0 < 0- As 
we will see later, this will be an adequate description 
of our random RNA models at high enough tempera- 
tures (but before denaturation.) For the real RNAs, this 
provides a coarse grained description of repetitive, self- 
complementary sequences, e.g., CAGCAG...CAG, which 
arc involved in a number of diseases ||2^ . We will refer to 
RNA which is well described by this model without se- 
quence disorder as being in the "molten" phase. It serves 
as a starting point for modeling non-specific self-binding 
of RNA molecules, and its properties will form the basis 
of our study of the random RNA at low temperatures. 




FIG. 4: Definition of the "size profile" hi of a secondary RNA 
structure: The size profile measures the extension of the struc- 
ture if drawn as a planar diagram. As an intrinsic definition 
of hi which captures this notion of the size of a secondary 
structure at position i, we use the "ladder distance" of base 
i from the end of the molecule, i.e., the number of base pairs 
which have to be crossed when connecting position i to posi- 
tion 1 along the folded structure as indicated by the dashed 
line. 



2. Partition function 

Since in the absence of sequence disorder, the energy of 
a structure S depends only on the number of paired bases 
\S\ of this structure, we can write the partition function 
in the molten phase as 



Z{N)= J2 exp[-/3£o|5|] 
ses(Af) 



(16) 



The partition functions of the sub-strands Zij become 
translationally invariant and can be written as 



^'1,3 



GU 



(17) 



where G{N) is only a function of the length N. The 
recursion equation (M) then takes the form 



N-l 



G{N + 1) = G{N) + qJ2 '^('^) ■ <^(^ ~ ^)' 
where 



fc=i 



qEEe-'^^". 



Upon introducing the z-transform 

CX) 

G{z) = J2 G{N)2 



-N 



(18) 



(19) 



(20) 



the convolution can be eliminated and the recursion equa- 
tion turns into a quadratic equation 



zG{z)-l = Giz) + qG^{z) 



with the solution 



G(z) 



1 - v/(z - 1)2 - 4g 
2q 



(21) 



(22) 



Performing the inverse z-transformation in the saddle 
point approximation yields the expression |14, p6, [22 



GiN)^Ao{q)N-'o,l^^^q) 



(23) 



N=l 



in the limit of large N, with the exponent 6*0 = 3/2 and 
the non-universal quantities ZQ{q) = 1-f 2^ and Ao{q) ~ 
[(l + 2^)/47rg3/2]i/2. 

This result characterizes the state of the RNA where 
a large number of different secondary structures of equal 
energy coexist in the thermodynamic ensemble, and the 
partition function is completely dominated by the config- 
urational entropy of these secondary structures. While 
the result is derived specifically for the special case 
£jj = sq, we will argue below that it is applicable also 
to random Eij's at sufficiently high temperatures, in the 
sense that for long RNA molecules, the partition function 
is dominated by an exponentially large number of sec- 
ondary structures all having comparable energies (within 
0{kBT)) that are smoothly related to each other in con- 
figuration space. The latter is what we meant by the 
"molten phase" . 



3. Scaling behavior 

The exponent 6q = 3/2 is an example of a scaling 
exponent characteristic of the molten phase. This and 
other exponents can be derived in a geometric way by 
the "mountain" representation of secondary structures as 
illustrated in Fig. |3|(b). Each such mountain corresponds 
to exactly one secondary structure. In the molten phase, 
the weight of a secondary structure S is simply given by 
(/'■^l. This can be represented in the mountain picture by 
assigning a weight of q^' ^ iq every upward and downward 
step and a weight of 1 to every horizontal step. Since the 
only constraints on these mountains are (i) staying above 
the baseline, and (ii) returning to the baseline at the end, 
the partition function of an RNA of length N is then 
simply that of a random walk of TV steps, constrained to 
start from and return to the origin, in the presence of a 
hard wall at the origin, with the above weights {^ or 
1) assigned to each allowed step. This partition function 
is well-known to have the characteristic TV"^/^ behavior 
which we formally derived in the last section 124]. 

In this framework, it also becomes obvious why impos- 
ing a minimal hairpin length does not change the uni- 
versal behavior of RNA at least in this molten phase: If 
the minimal allowed size of a hairpin is s, this enforces a 



potentially strong penalty for the formation of a hairpin, 
since with every hairpin s bases are denied the possi- 
bility of gaining energy by base pairing. This tends to 
make branchings less favorable and thus leads to longer 
stems. However, this additional constraint translates in 
the mountain representation into the rule that an up- 
wards step may not be followed by a downwards step 
within the next s steps. This is clearly a local modifica- 
tion of the random walk. Thus, it does not change univer- 
sal quantities although the above mentioned suppression 
of branchings will require much longer sequences in or- 
der to observe the asymptotic universal behavior. For 
real RNA parameters, the crossover length is very long 
due to this effect. For example, it is several hundred nu- 
cleotides for the CAG repeat, and even longer for some 
other repeats. 

Another characteristic exponent describes the scaling 
of the ladder size (h) with the sequence length N. As al- 
ready mentioned in its definition (|l5|), (h) is equivalent to 
the average "height" of the mid-point of the sequence in 
the mountain picture. In the molten phase, the random 
walk analogy immediately yields the result 



{h)o - N^'\ 



(24) 



where (...)o denotes ensemble average in the molten 
phase. 

As should be clear from the coarse-grained view de- 
picted in Fig. E, the ensemble of RNA secondary struc- 
tures in the molten phase can be mapped directly to the 
ensemble of branched polymers. These branched poly- 
mers are rooted at the bases i = 1 and i = iV of the 
RNA. In this context, 6*0 = 3/2 is known as the config- 
uration exponent of the rooted branched polymer [E5J. 
Additionally from the result (|2J), we see that the ladder 
length of the branched polymer scales ^ as N^^^. Because 
of the very visual analogy of the secondary structures to 
branched polymer, we refer to the configurational entropy 
of the secondary structures as the "branching entropy" . 

Finally, the binding probabilities Pij defined in 
Eq. (|l^) only depend on the distance \i — j\ in the molten 
phase, i.e., P^j = p{\i — j\). The behavior of this function 
can be derived explicitly by inserting the result Eq. (E3) 
for the partition function into Eq. (KOtj. Alternatively, 
one just needs to recognize that p{i) corresponds in the 
random walk analogy to the first-return probability of a 
random walk after ^-steps. In either case, one finds the 
result 



p{£) 



ri{N-i)- 



(25) 



^ For a real branched polymer, each branch will have a spatial 
extension which scales as the square root of its ladder length (in 
the absence of excluded volume interaction). Then the typical 
spatial extension of a branched polymer scales as Af^'^, a well- 
known result for the branched polymer in the absence of self- 
avoidance fl25l. 



i.e., the return probability decays with the separation (. 
of the two bases as a power law with the configuration 
exponent 6*0 — 3/2. For the pinching free energy AF(A^), 
we simply set t — N/2 and obtain 



AFo^-fceTlnA^ 



(26) 



for large iV, i.e., it scales logarithmically in the molten 
phase. This logarithmic dependence merely reflects the 
loss in branching entropy due to the pinching constraint 
and is a manifestation of the configuration exponent ^o = 

3/2. 



III. EFFECT OF SEQUENCE RANDOMNESS: 
HIGH TEMPERATURE BEHAVIOR 

There are in principle three different scenaria for the 
behavior of long random RNA sequences, (i) Disorder is 
irrelevant at any finite temperature, so that the molten 
phase description presented in Sec. [I B applies to long 
RNAs at all temperatures, (ii) Disorder is relevant at all 
temperatures, and the molten phase description would 
be completely inadequate, (iii) There is a finite temper- 
ature Tg above which the molten description of random 
RNA is correct, while below Tg a qualitatively different 
description is needed. In accordance with the statistical 
physics literature, we will refer to the non-molten phase 
as the glass phase, and Tg as the glass transition. The 
purpose of the study is to determine which of these three 
scenaria is actually realized, and to characterize the glass 
phase if cither (ii) or (iii) is realized. 

In this section, we study the high temperature behavior 
and demonstrate that the molten phase is stable with 
respect to weak sequence disorder. This e nsur es that the 
molten description of RNA given in Sec. II B is at least 
valid at high enough temperatures, thereby ruling out 
scenario (ii). We will address the question of whether 
there is a glass phase at low but finite temperatures in 
Sec.liV. 



A. Numerics 

Before we engage in detailed calculations, we want 
to convince ourselves with the help of some numerics 
that weak disorder does not destroy the molten phase. 
To t his en d, we study the observables introduced in 
Sec. [I A 4 . We generate a large number of disorder 
configurations, i.e., interacti on ene rgies e^j using the 
3 models introduced in Sec. II A 2| : sequence disorder, 



energy disorder, and Gaussian disorder as described by 
Eq. (|), Eqs. (|) and (|), and Eqs. (|) and (|) respec- 
tively, with M,ji = Unim = 1- Then, we calculate the 
observables {h) and IS.F{N) for each disorder configu- 
ration at the relatively large temperature of fc^T = 2 
and average the obtained values over many disorder con- 
figurations. In order to keep the numerical effort man- 



ageable, we average over 10,000 random sequences for 
N e {10,20,40,80,160,320}, over 2,000 sequences for 
iV = 640, and over 1,000 sequences for iV = 1,280 and 
iV = 2,560. 

Fig. shows the results; disorder averaged quantities 
are denoted by a n o verUne throughout the text. We see 
that the data for (h) foUows a power law with a fitted ex- 
ponent (h) ^ A^*^'^^, with the exponent value decreasing 
for larger A^'s. This result is consistent with the predic- 
tion Eq. (124) for the molten phase. Also, the pinching 
free energy follows the predicted logarithmic behavior 
Eq. ( p6| ) without any noticeable difference between the 
three choices of disorder. Taken together, these results 
indicate that the three models of disorder belong to the 
same universality class, i.e., the molten phase description 
of the uniformly attracting RNA, at high temperatures. 



B. The replica calculation 
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FIG. 5: Scaling in the molten phase: These two plots show 
the dependence of several characteristic quantities of RNA 
secondary structures on the length N of the sequence at 
fcflT = 2. Each plot shows data for three different choices 
of disorder according to Eqs. (pl), M), and (0). (a) shows the 
scaling of the average size (/i) and the dashed line is the best 
fit (/i) ~ N'^-^'^ to a power law. (b) shows the free energy of 
the largest pinch as defined in (|l2). The dashed line is up 
to an additive constant the logarithmic behavior | x 2 In A' 
predicted in Eq. (M). The statistical fluctuations are smaller 
than the size of the symbols in both plots. All plots suggest 
that the behavior of RNA secondary structures at high tem- 
peratures is well described by the molten phase picture and 
independent of the disorder. 



J 



Now, we will establish the stability of the molten 
phase against weak disorder by an analytical argument. 
We will use Gaussian disorder characterized by Eqs. (|^) 
and (P). As we have shown above, the different mi- 
croscopic models of binding energies all yield the same 
scaling behaviors. With the uncorrelated Gaussian en- 
ergies, it is possible to perform the ensemble average of 
the partition function Z" of n RNA molecules sharing 
the same disorder. The disorder-averaged free energy 
can then in principle be obtained via the "replica-trick" 
InZ = lim„^o(^" ^ l)/"i by solving the n-replica prob- 
lem [|6|. 

The n-replica partition function can be written down 
formally as 
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where 

q = exp(-/?e+i/32i)j and q = exp{(3^D) (27) 

are the two relevant "Boltzmann factors" . This effective 
partition function has a simple physical interpretation: 
It describes n RNA molecules subject to a homogeneous 



attraction with effective interaction energy Eq 



h0D 



between any two bases of the same molecule. As before, 
this effective attraction is characterized by the factor q. 
In addition, there is an inter-replica attraction character- 
ized by the factor g for each bond shared between any pair 
of replicas. The inter-replica attraction is induced by the 
same sequence disorder shared by all replicas. For ex- 
ample, if the base composition in one piece of the strand 
matches particularly well with another piece, then there 
is a tendency to pair these pieces together in all replicas. 
Thus, the inter-replica attraction can potentially force 
the different replicas to "lock" together, i.e., to behave 
in an correlated way. Indeed, the distribution of inter- 
replica correlations, usually measured in terms of "over- 
laps" , is a common device used to detect the existence of 
a glass phase in disordered systems [ p7| . 

The full n-replica problem is difficult to solve analyti- 
cally. We will examine this problem in the regime of small 
D, aiming to resolve the relevancy of disorder in a per- 
turbative sense. Since the lowest order term of the fully 
random problem in a perturbation expansion in D corre- 
sponds to the two-replica (n = 2) problem, we will focus 
on the latter in order to study the small- Z? behavior of 
the full problem. The solution of the two-replica problem 
will also illustrate explicitly the type of interaction one 
is dealing with, thereby providing some intuition needed 
to tackle the full problem. It turns out that the two- 
replica problem can be solved exactly. Here, we outline 
the saline features of the solution. Details of the calcu- 
lation and analysis are provided in the Appendices. We 
will find that the two-replica system has a phase transi- 
tion between the molten phase in which the two replicas 
are uncorrelated and a nontrivial phase in which the two 
replicas are completely locked together in the thermody- 
namic limit. The transition occurs at a finite temperature 
Tc{D) which approaches zero as I? ^ 0. Thus, the effect 
of weak disorder is irrelevant at finite temperatures. 

Let us denote the two-replica partition function Z^ for 
two strands each of length TV by G{N + l;q), where we 
keep the dependence on q implicit. Then, 



G{N + l;q) 



E 

Si,S2es(Jv) 



q\S^\ + \S2\q\S,nS,\^ (28) 



The key observation which allows us to solve the two- 
replica problem is that for each given pair of secondary 
structures, the bonds shared by two repHcas (hereafter 
referred to as "common bonds" ) form a valid secondary 
structure by themselves (see Fig. ^.) Thus, we can rear- 
range the summation over the pairs of secondary struc- 
tures in the following way: We first sum over all possible 



secondary structures of the common bonds. For a given 
configuration of the common bonds, we then sum over 
the remaining possibilities of intra-replica base pairings 
for each replica, with the constraint that no new common 
bonds are created. 




FIG. 6: Grouping of two RNA structures according to their 
common bonds: Each pair of RNA secondary structures like 
the one on the left hand side can be classified according to the 
bonds which are common to both structures (open circles.) 
These common bonds form by themselves an RNA secondary 
structure (right hand side.) Thus, the sum over all pairs of 
secondary structures can be written as the sum over all pos- 
sible secondary structures of the common bonds. The weight 
of each common bond structure is then given by the inter- 
action energies of common bonds and the summation over 
all possibilities of arranging non-common bonds in the given 
common-bond structure. Since non-common bonds have to 
be compatible with the common bond structure, the latter 
sum factorizes into independent contributions of all the loops 
of the common bond structure (grey circles.) Each such con- 
tribution solely depends on the number of non-common-bond 
bases in each of these loops. 



Note that the common bonds partition the diagram 
into a number of "bubbles"^, shown as the shaded regions 
in Fig. 0. Due to the exclusion of pseudo-knots from the 
valid secondary structures, only bases belonging to the 
same bubble can be paired with each other. Thus, the 
two-replica partition function can be written as 



g{N + i-q}= J2 (9'5)i^i n 2'(^» 

ses{N) bubble i of S 



1), (29) 



where the factor q^q is the weight of each common bond, 
and Qi{£i -I- 1) is the sum of all possible intra-replica 
pairings of of the j*^ bubble of £i bases in S, with the 
restriction that there are no common bonds. 

It should be clear that Qi neither depends on the num- 
ber of stems branching out from the bubble i nor the 
positions of these stems relative to the bases within the 



* The two ends of the sequence must also belong to a bubble if 
they are not common bonds 
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bubble. It depends on S only through the number of 
bases £i in the bubble and is given by a single function Q 
independent of i. This function can be written explicitly 
as 



Q{i + l)^ 



^1 n ^2 = 



IS1I+IS2I 



(30) 



With Eqs. ( |29| ) and (30), the two-replica problem is re- 
duced to an effective single homogeneous RNA problem, 
with an effective Boltzmann weight q^q for each pairing, 
and an effective weight Q for each single stranded loop. 
As described in Appendix ^ this problem becomes for- 
mally analogous to that of an RNA in the vicinity of 
the denaturation transition, with Q being the weight of 
a single polymer loop fluctuating in 6-dimensional em- 
bedding space. The competition between pairing energy 
and the bubble entropy leads to a phase transition for 
the two-replica problem, analogous to the denaturation 
transition for a single RNA. 

The details of this transition are given in Appendix |^, 
where the partition function (p9h is solved exactly. The 
exact solution exploits the relation 



QiN)^g{N;q = 0) 



(31) 



which follows from the definitions (Um and (po|), and 
turns Eq. ( p9| ) into a recursive equation for Q. The solu- 
tion is of the form 



g{N;q)^N-''Ciq,q) 



(32) 



for large N, with two different forms for 9 and ( depend- 
ing on whether q is above or below the critical value 



1 



1 



q'EN=iNG^N)il + 2^) 



-2{N-1) ' 



(33) 



Here G{N) is the molten phase partition function, whose 
large N asymptotics is given by Eq. (23) and whose val- 
ues for small N can be calculated explicitly from the 
recursion Eq. (|l8|). Thus, the actual value of qc can be 
found for any given q. 

For q < qc, we have 6 = 3 and 



C^il + 2^f + q\q-l)giiq), 



where 



51(g) =£G2(iV)(l + 2V^)- 



2N 



(34) 



(35) 



N=l 



according to Eqs. (BIO) and (B12). In this regime, the 
two-replica partition function Q is essentially a product 
of two single-replica partition functions G. Compared to 
Eq. (p3|), we can identify 9 as 29o, and C as a modified 
version of Zq = (1 -I- 2^)^. Since there is no coupling of 
the two replicas beyond a trivial shift in the free energy 
per length, / = — fc^Tln^, we conclude that the disorder 



coupling is irrelevant. Hence the two-replica system is in 
the molten phase in this regime. 

For q > qc, we have 9 = 3/2 and C is given as the im- 
plicit solution of an equation involving only singic-rcplica 
partition fmictions as shown in Eqs. ( |B17D and (B20). 
Here, the partition function of the two-replica system is 
found to have the same form as that of the single-replica 
system in (^3|). This result implies that the two repli- 
cas are locked together via the disorder coupling, and the 
molten phase is no longer applicable in this regime. 

Of course, as already explained above, only the weak- 
disorder limit (i.e., f3'^D ^ 1) of the two-replica problem 
is of relevance to the full random RNA problem. In this 
limit, g sa 1+(3^D while qc is found by evaluating Eq. (p3) 



with q 



-Pe 



It can be easily verified that gc > 1 as 



long as q is finite. Thus in the weak disorder limit, we 
have q < qc, indicating that the molten phase is an appro- 
priate description for the random RNA. Unfortunately, 
the two-replica calculation cannot be used in itself to de- 
duce whether the molten phase description breaks down 
at sufficiently strong disorder or low temperature. Based 
on this analysis, we cannot conclude whether the type of 
phase transition obtained for the two-replica problem is 
present in the full problem. 



IV. EFFECT OF SEQUENCE RANDOMNESS: 
LOW TEMPERATURE BEHAVIOR 

Having established the validity of the molten phase 
description of random RNA molecules at weak disorder 
or high temperatures, we now turn our focus onto the 
low temperature regime. First, we will give an analyt- 
ical argument for the existence of a glass phase at low 
temperatures. Then, we will present extensive numeri- 
cal studies confirming this result and characterizing this 
glass phase. 



A. Existence of a glass phase 

We will start by showing that the molten phase cannot 
persist for all temperatures down to T = 0+. To this 
end, we will assume that long random RNA is in the 
molten phase for all temperatures, i.e., that the partition 
function for any substrand of large length L ^ 1 is given 
by 



Z{L) = A{T)L-2 exp[-(3fo{T)L] 



(36) 



with some effective temperature-dependent prefactor 
A{T) and free energy per length fo{T). Then, we will 
show that this assumption leads to a contradiction below 
some temperature T* > 0. This contradiction implies 
that the molten phase description breaks down at some 
finite Tg > T*. To be specific, we will consider the se- 
quence disorder model (0) in this analysis. 

The quantity we will focus on is again the free en- 
ergy AF{N) of the largest possible pinch. Under the 
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N/2+1 




N/2+1 



FIG. 7: Finding a good match in a RNA sequence: (a) shows 
the position of two pieces with exactly complementary bases 
one of which is between positions 2 and N/2 and the other 
of which is between positions N/2 + 2 and A'^. Such a piece 
of length I r^lnN can be found for almost all sequences, (b) 
shows how restricting configurations to those in which the 
good match forms Watson-Crick base pairs splits the molecule 
into two loops which can still form base pairs within the loops 
independently from each other. 



assumption that the random sequences are described by 
the molten phase, it is given by 



AF{N) = -keTlnN 



(37) 



for large N and all T independently of the values of the 
effective prefactor A{T) and the free energy per length 
/o(T) (see Eq. {^.) 

On the other hand, we can study this pinching free 
energy for each given sequence of bases drawn from the 
ensemble of random sequences. For each such sequence, 
we can look for a continuous segment of £ <C iV Watson- 
Crick pairs bi-bj, bi+i~bj^i, . . . , bi+i-i-bj-i+i where the 
bases bi . . . bi+g-i are within the first half of the molecule 
and the bases bj-e^+i ■ ■ -bj are in the second half (sec 
Fig. 0(a).) For random sequences, the probability of find- 
ing such exceptional segments decreases exponentially 
with the length i, with the largest i in a sequence of 
length N being typically 



£ = X-'^lnN. 



(38) 



For exact complementary matches, the proportionality 
constant is known to be A = In 2 [^ . 

Now, we calculate the pinching free energy 



ZA_r yl\ j -^pinched -^unpinchi 



cd 



(39) 



by evaluating the two terms separately. The partition 
function for the unpinched sequence contains at least all 
the configurations in which the two complementary seg- 
ments bi . . . foi+f-i and 5j-£+i ■ ■ -bj are completely paired 
(see Fig. 1(b)). Thus, 



F, 



unpinched 



<i^n 



(40) 



where Fpaircd is the free energy of the ensemble of struc- 
tures in which the two complementary segments are 
paired. The latter is the sum of the free energy of the 
paired segments and those of the two remaining sub- 



6j+i . . . bi-i (wrapping around the end of the molecule) 
of length L2~N+i—j — l, i.e., 

Fpaircd=-^U,n+(iV-2£)/o+|fcBT[ln(Li)+ln(L2)]. (41) 

The free energy i^pinchod in the presence of the pinch is, 
by the assumption of the molten phase, the interaction 
energy of the pinched base pair 6i-6jv/2+i plus the molten 
free energy of the substrand 62 ■ ■ • bj^/2 and the molten 
free energy of the substrand 67V/2+2 ■ ■ -bN, i-e., according 
to Eq. (^) 

Fpinchod = fo{T)N + 2 X ^keTlnN (42) 

up to terms independent of N. Combining this with 
Eqs. (H), (igl), and (gl|), we get 

AF{N)>^kBT[2\nN-\nLi-\nL2] + i[u,r, + 2fo{T)]. 

. (^^) 
Using the result (pq) and that Li and L2 are typically 

proportional to N, we finally obtain 

AF{N) > [u,n + 2/o(r)] A-i In N (44) 

for large N. This is only consistent with Eq. (BTh if 



^keT > X- 



K + 2/o(T)]. 



(45) 



strands b. 



'i+e 



. bj^i of length Li 



J 



2^ -f 1 and 



Now, fo{T) is a free energy and is hence a monoton- 
ically decreasing function of the temperature. Thus the 
validity of the inequality ( p7\ ) depends on the behavior 
of its right hand side at low temperatures. As T ^ 0, 
the inequality can only hold if cr = u„^ + 2fo{T = 0) < 0. 
Since the average total energy at T = is Mm times 
the average total number of matched pairs of a random 
sequence, then 2/o(0) is simply the fraction of matches 
and (7 is the fraction of bases not matched (for Um = !•) 
Clearly, a cannot be negative, and the inequality ( |37| ) 
must fail at some finite temperature unless a = 0. 

We can make a simple combinatorial argument to show 
that in most cases the fraction a of unbound bases must 
strictly be positive. To illustrate this, let us generalize 
the "alpha bet size" of the sequence disorder model of 
Sec. II A 2 from 4 to an arbitrary even integer K > 2. 
We will still adopt the energy rule (||) where each of the 
K bases can form a "Watson- Crick" pair exclusively with 
one other base. Let us estimate the number of possible 
sequences for which the fraction of unmatched bases a 
is zero in the limit of long sequence length A^ at T = 0. 
Since at T = 0, only Watson-Crick (W-C) pairs can be 
formed, we only need to count the number of sequences 
for which the fraction of W-C paired bases is 1. This 
means that except for a sub-extensive number of bases, 
all have to be W-C paired to each other. From the moun- 
tain picture (Fig. ra), it is clear that the number of possi- 
ble secondary structures for such sequences must scale 
like 2''^, since the fraction of horizontal steps is non- 
extensive so that at each step, there are only the pos- 
sibilities for the mountain to go up or down. For each of 
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the N/2 pairings in one of these 2 structures, there are 
K ways of choosing the bases to satisfy the pairing. So 
for each structure, there are K^ ^"^ ways of choosing the 
sequence that would guarantee the structure. Since there 
are a total of K^ sequences, it is clear that the fraction of 
sequences with all (but a sub-extensive number of) W-C 
pairs becomes negligible if 



(2^f < K 



N 



(46) 



Thus, for K >Q,we must have cr > 0. 

For X = 2, the left hand side of Eq. (46) grows faster 
than its right hand side. This reflects the absence of 
frustration in this simple two-le tter model as already dis- 
cussed at the end of Sec. II A2| . One way to retain frus- 
tration is to introduce additional constraints, e.g., the 
minimal hairpin length used in Ref. [|7|. With this con- 
straint, a structure with a sub-extensive number of un- 
matched bases can only contain a sub-extensive number 
of hairpins. In the mountain picture, this means that 
except for a sub-extensive number of steps, there is only 
one choice to go up or down at every step. This changes 
Eq. (^) to K'^/'^ < K^ . It ensures frustration since 
a > for all K. Since a minimal length of 3 bases is 
necessary in the formation of a real hairpin, real RNA is 
certainly frustrated by this argument. The random se- 
quence model which we study in this paper is marginal 
since X = 4 and there is no constraint on the minimal 
hairpin length. In this case, all the prefactors on the two 
sides of Eq. ( [46|) (e.g., the overcounting of sequences that 
support more than one structure) must be taken into ac- 
count. We will not under take this effort here, but will 
verify numerically in Sec. p^VC| that cr > also in this 
case. 

In all cases with ct > 0, it follows that there is some 
unique temperature T* below which the consistency con- 
dition (Eq) breaks down, implying the inconsistency of 
the molten phase assumption in this regime. From 
this we conclude that there must be a phase transition 
away from the molten phase at some critical temperature 
Tg > T* > 0. The precise value of the bound T* depends 
on A which in turn depends on the stringency of the con- 
dition we impose on the rare matching segments. For 
instance, if we relax the condition of exact complemen- 
tarity between two segments to allow for matches within 
each segment, then the constant A will be reduced from 
In 2 and the value of T* will increase. This will be dis- 



pair is penalized with a factor e*° relative to a Watson- 
Crick base pair, and a non Watson-Crick base pair is pe- 
nalized even more. Thus, only the minimal energy struc- 
tures contribute (for the sequence lengths under consid- 
eration here), and we ma y regard this effectively as at 
T = 0. As in Sec. Ill A , we average over 10,000 real- 
izations of the disorder for N e {10, 20, 40, 80, 160, 320}, 
over 2, 000 realizations for N — 640 and over 1, 000 real- 
izations for N e {1280,2560}. 

Fig. H shows the results for the ladder size {h) of the 
structures for the three models of disorders. The lad- 
der size still scales algebraically with the length of the 
sequences, with numerically determined exponents rang- 
ing from "(Xy - iVO-65 to(h)r^ A^o.69 Jq^. ^-^e different 
choices of disorder. They are clearly different from the 
square root behavior (dotted line) expected of the molten 
phase. Thus this result reafhrms our expectation that the 
secondary structures of a random RNA sequence at zero 
temperature indeed belongs to a phase that is different 
from the molten phase. 
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FIG. 8: Scaling of the average size (In) of secondary struc- 
tures in the low temperature phase with the length A^ of the 
sequences: The plot shows data for three different choices of 
disorder according to Eqs. (|), (|), and (|) at T = 0.025. 
The average system size follows a power law. However, the 
best fit of the data for sequence disorder at A'^ > 160 to a 
power law indicated by the dashed line leads to an exponent of 
(/i) ~ jY""'®®. The corresponding fits for energy and Gaussian 
disorder yield exponents of (/i) ~ JV°®® and (/i) ~ jv"'®^, re- 
spectively. This is distinctively different from the square root 
behavior of the molten phase indicated by the dotted line. 
The comparison of this plot with its counterpart in Fig. H(a) 
suggests that the behavior of RNA secondary structures at 
low temperatures is different from the molten phase. 



B. Characterization of the glass phase 



The above argument does not provide any guidance on 
the properties of the low temperature phase itself. In or- 
der to characterize the statistics of secondary structures 
formed at low temp eratures, we re-do the simulations re- 
al /cbT = 0.025 in energy units set by 
At this temperature, an unbound base 



ported in Sec. [Ill A 



1. 



1. A criterion for glassiness 

A key question in characterizing the thermodynamic 
properties of disordered systems is whether the zero- 
temperature behavior persists for a range of finite tem- 
peratures. If it does, then the system is said to have a 
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finite-temperature glass phase. One way to address this 
question is to study the overlap between different replicas 
of the RNA molecule as mentioned earlier. If a non-trivial 
distribution of these overlaps with significant weight on 
large overlaps persists into finite temperatures, then the 
finite-temperature glass phase exists. This approach was 
taken by previous numerical studies |13, n% Iq, M . Un- 
fortunately, the results are inconclusive and even contra- 
dictory due to the weakness of the proposed phase transi- 
tion — only the fourth temperature derivative of the free 
energy seems to show an appreciable singularity. More- 
over, due to limitations in the sequence lengths probed, 
it was difficult to get a good estimate of the asymptotic 
behavior of the overlap distribution. 



In our study, we adopt a different approach based on 
the droplet theory of Fisher and Huse |2^. In this ap- 
proach, one studies the "large scale low energy excita- 
tions" about the ground state. This is usually accom- 
plished by imposing a deformation over a length scale 
£ ^ 1 and monitoring the minimal (free) energy cost of 
the deformation. This cost is expected to scale as £" for 
large i. A positive exponent uj indicates that the de- 
formation cost grows with the size. If this is the case, 
the thermodynamics is dominated by a few low (free) en- 
ergy configurations in the thermodynamic limit, and the 
statistics of the zero-temperature behavior persists into 
finite temperatures. On the other hand, if the exponent 
uj is negative, then there are a large number of configura- 
tions which have low overlap with the ground state but 
whose energies are similar to the ground state energy in 
the thermodynamic limit. At any finite temperature T, 
a finite fraction of these configurations (i.e., those within 
0{kBT) of the ground state energy) will contribute to 
the thermodynamics of the system. The zero tempera- 
ture behavior is clearly not stable to thermal fiuctuations 
in this case, and no thermodynamic glass phase can exist 
at any finite temperature. The analysis of the previous 
section indicate the existence of a glass phase; thus we 
expect to find the excitation energy to increase with the 
deformation size. 



It should be noted that this criterion for glassiness is 
purely thermodynamical in nature and does not make 
any statement about kinetics. A system which is not 
glassy thermodynamically can still exhibit very large bar- 
riers between the many practically degenerate low en- 
ergy configurations, leading to a kinetic glass. A study 
of the kinetics of RNA, e.g., in terms of barrier heights, 
is naturally dependent on the choice of allowed dynami- 
cal pathways to transform one RNA secondary structure 
into another one IRO, m^, 32, BSl. Since the latter is a 



2. Droplet excitations 

According to the criterion for glassiness just presented, 
our goal is to determine the value of the exponent cu for 
random RNAs numerically. To this end, the choice of 
"large scale low energy excitations" needs some careful 
thoughts. As in every disordered system, there is a very 
large number of structures which differ from the mini- 
mal energy structure only by a few base pairs and which 
have an energy only slightly higher than the minimum 
energy structure. These structures are clearly not of in- 
terest here. Instead we need to find a controlled way of 
generating droplet excitations of various sizes. 

We prop ose to use the pinching method introduced in 
Sec. II A4| as a way to generate the deformation, and re- 
gard the difference between the minimal energy pinched 
structure and the ground state structure as the droplet 
excitation. There are several desirable features about 
these pinch-induced deformations: First, it gives a con- 
venient way of controlling the size of the deformation. If 
(i, i') is a base pair that is bound anyways in the ground 
state, pinching this base pair does not have any effect 
and AFi^i' = 0. If we pinch base i with some other base 
J ^ i' , then we force at least a partial deformation of the 
ground state, for bases in the vicinity of i, i', and j. This 
is illustrated in Fig. with the deformed region indicated 
by the shade. As we move the pinch further away from 
the ground state pairing, we systematically probe the ef- 
fect of larger and larger deformations (provided that a 
pinch only induces local deformation as we will show). 
Second, the minimal energy or the free energy of the 
secondary structures subject to the pinch constraint is 
easily calculable numerically by the dynamic program- 
ming algorithm as shown in Eqs. (Iin) and (O). Third, 
the pinching of the bases in a sense mimics the actual 
dynamics of the RNA molecule at low temperatures. In 
order for the molecule to transform from one secondary 
structure to another at a temperature where all match- 
ing bases should be paired, the bases have to make local 
rearrangements of the secondary structures much like the 
way depicted in Fig. ^ [g2[. Thus, the pinching energy 
provides the scale of variation in the local energy land- 
scape for such rearrangements^. Finally, "pinching" of 
a real RNA molecule can be realized in the pulling of a 
long molecule through a pore Q . 

A key question to the utility of these pinch deforma- 
tions is whether the deformation is confined to the local 
region of the pinch as depicted in Fig. or whether it 
involves a global rearrangement of the structure. To test 



highly non-trivial problem, we will restrict ourselves to 
thermodynamics and use the droplet picture explained 
above as our criterion for the existence of a glass phase. 



'^ While local rearrangements will only proceed by forming differ- 
ent Watson-Cricls base pairs, we will in our study determine the 
pinching free energies for all pinches irrespective of the fact if 
they are a Watson-Crick base pair or not. Since we take the 
ensemble average over many sequences this amounts only to an 
irrelevant constant contribution (eij ) — Um =s — Mm to the pinch 
free energies. 
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FIG. 9: Deformation of a minimal energy structure by pinch- 
ing: The two bases i and i' (open circles) form a base pair 
in the indicated minimal energy structure (a). Thus, forcing 
these two bases to be bound by pinching does not affect the 
structure at all. Pinching of (i, j) on the other hand will lead 
to a local rearrangement (shaded region) of the structure as 
shown in (b) . The effect of such a pinch depends on the num- 
ber of base pairs of the minimal energy structure the pinch 
is incompatible with. As indicated by the arrow in (a), this 
number is given by the ladder distance hij between i and j; 
in this example hij — 3. 



this aspect, we numerically study the changes in pinch 
free energy as a function of the "size" of a pinch. Here, 
the definition of the pinch size needs some thought. Con- 
sider a specific sequence whose minimal energy structure 
is S* . If the binding partner of base i is base i' in the 
minimal energy structure, a natural measure for the size 
of a pinch (i, j) ^ S* with i < j < i' would be the lad- 
der distance hij between base i and base j; see Fig. g. 
From the mountain representation (Fig. ^(b)), it is easy 
to see that this is just the difference of the respective lad- 
der distances of base j and base i from base 1 as defined 
in Eq. (|l3|), i.e., hi^j = hj{S*) - h,{S*). To find how 
the excitation energy depend on the pinch size, we just 
need to follow how the pinching free energy AFij's de- 
pend statistically on the size hij's. To do so, we choose 
a large number of random sequences, and determine the 
minimal energy structure S* for each of these sequences. 
Then, we compute the pinch free energies AFij and the 
pinch size hij for all possible pinches («,j) for each se- 
quence. Afterwards, we average over all AF^^'s with the 
same pinch size hij over all of the generated random 
sequences to obtain the function 



5F{Sh) = J2 ^K 



j OShMi,. 



E' 



5h,hi,. 



(47) 



The results obtained at fc^T = 0.025 for a large range of 
sequence sizes from A^ = 80 to A^ = 2, 560 are shown in 
Fig. M(a). We see that the data for different A^'s fall on 
top of each other for small Sh's, with 



SF - (Sh) 



0.27 



(48) 



This behavior explicitly shows that the pinch deforma- 
tion is a local deformation. In particular, we see that 
for small J/i's, the free energy cost is independent of the 
overall length A^ of the molecule. 
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FIG. 10: Pinching free energy as a function of the number Sh 
of minimal energy structure base pairs that are incompatible 
with the pinch for random sequence disorder at T = 0.025: 
(a) shows the raw data. For small Sh the pinching free is 
independent of the length A'^ of the molecule and obeys a 
power law SF{Sh) ~ (5/i)°'^^ (dashed line.) This is consistent 
with the expectation that pinching at small Sh leads to lo- 
cal rearrangements of the secondary structure. The apparent 
non-monotonic behavior at large Sh is due to the small num- 
ber of sequences in which such a value of Sh is realized, (b) 
shows the same data, but the scaling of Sh with N is chosen in 
accordance with Fig. H while the scaling of the pinching free 
energy is chosen in accordance with the power law dependence 
estimated above. 



It is interesting to see at which Sh the entire sequence 
is involved. One expects ShmaK ^ {h) ~ ]\[^-^^ since (h) 
gives the typical scale of the maximum ladder length. To 
test this, we normalized Sh by iV°'^^ and 6F by TV"-^^ 
(such that the relation ( [48|) is preserved for small 5/i's). 
The result is shown in Fig. lO^(b). We see that the data 
is approximately collapsed onto a single curve, indicat- 
ing that pinching is indeed a good way of imposing a 
controlled deformation from the ground state. 



3. A marginal glass phase 

The scaling plot of Fig. |l^(b) indicates strongly that 
the energy associated with the pinch deformation in- 
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creases with the size of the deformation, i.e., 5F ~ 
{5hf'^'' ~ A^^'"'^^. However, the effective exponent in- 
volved is small, making the result very susceptible to fi- 
nite size effects. In order to decide on the glassiness of 
the system, we want to focus on the energy scales associ- 
ated with the largest pinch deformations from the ground 
state. Assuming that there is only a single energy scale 
associated with large pinches, we again study the free en- 
ergy AF{N) of the largest pinch as defined in Eq. ( p^ ) 
and average this over the ensemble of sequences^. 

The results are shown in Fig. 0(a) for the three mod- 
els of disorders. Although a weak power law dependence 
of AF{N) on N cannot be excluded, the fitted expo- 
nents obtained for the three models are different from 
each other, ranging from 0.09 to 0.19. This is a strong 
sign of concern, since the exponents are expected to be 
independent of details of the models. In Fig. [Tl|(b), the 
same data is plotted on a log-linear scale. The data fall 
reasonably on a straight line for each of the models (es- 
pecially for large iV's), suggesting that the pinching free 
energy may actually increase logarithmically with the se- 
quence length, similar to what is expected of the behavior 
in the molten phase ! However in this case, the prefac- 
tor of the logarithm depends on the choice of the model 
and is much larger than the factor ^ksT expected of the 
molten phase; see Eq. (p6|). For example, for the nu- 
merical data obtained at ksT = 0.025, the prefactor is 
approximately 0.9 for the sequence disorder model, while 
the expected slope for the molten phase is 0.0375 at this 
temperature. Having different logarithmic prefactors for 
the different models is not a concern, since a prefactor 
is a non universal quantity. Thus, our numerical results 
favor a logarithmically increasing pinch energy, with a 
prefactor much exceeding fc^T at low temperature. 

What does this tell us about the possible glass phase 
of the random RNA? In order to answer this question, we 
should remind ourselves that rather special deformations 
are chosen in this study. For our choice of pinch defor- 
mations, we observe a logarithmic dependence of the gap 
between the ground state energy and the energy of the 
excited configurations on the length of the sequence or 
deformation. This corresponds to the marginal case of 
the droplet theory where the exponent to vanishes. Since 
the pinching free energies are increasing with length, we 
cannot exclude a glass phase in the case w = 0. We can 
say, though, that the increase of the excitation energy 
with length is at most a power law with a very small 
exponent and most probably even less than any power 
law. Therefore, a possible glass phase of RNA has to 



In order to ensure that choosing the largest pinch as a rep- 
resentative is justified, we studied in addition the ensem- 
ble average of the maximal pinch free energy A_Fniax(Af) = 
maxi<i<j<jv A_Fij'. This quantity yields an upper bound es- 
timate of the energy associated with large scale pinches for each 
sequence length iV. We find AFmax(Af) and AF{N) to have the 
same scaling behavior, and thus present only data for the latter. 
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FIG. 11: Pinching free energies at low temperature: (a) shows 
a double logarithmic plot with fits to power laws for the data 
with iV > 160. The exponents are iV''■^^ iV°■''^ and iV°-^^ for 
the sequence, energy, and Gaussian disorder respectively, (a) 
shows the same data in a single logarithmic plot together with 
the best fits to a logarithmic dependence on A''. The statisti- 
cal error of the data points is about the size of the symbols 
for large A*' and smaller than that for A'' < 640. Due to the 
apparent systematic bending of the data in the double loga- 
rithmic plot (b) we conclude that a logarithmic dependence 
fits the data better although we cannot exclude a power law 
behavior with a very small power. 



be very weak. If it turns out that the excitation energy 
is indeed a logarithmic function of length, with a non- 
vanishing prefactor as T ^ as our numerics suggest, 
then the low-temperature phase would be categorized for- 
mally as a marginal glass phase, analogous to behaviors 
found in some well-studied model of statistical mechan- 
ics |3g, ^, 0. In any case, we should note that the 
actual difference in the excitation energy is only a factor 
of 4 across two-and-a-half decades in length. Thus the 
glassy effect will be weak for practical purposes. On the 
other hand, the weak dependence of the excitation energy 
on length may be the underlying cause of discrepancies 
in the literature |1^, |8| [l^l regarding the existence of 
the glass phase for the random RNA. 
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C. Estimation of the phase transition temperature 

Now that we have studied in great detail the behav- 
ior of random RNA in the low and the high temperature 
phase, we describe its behavior at intermediate tempera- 
tures. To this end, we again study the pinch free energies 
AF{N) defined in Eq. (|lj), but this time over a large 
range of temperatures. We concentrate on the sequence 
disorder model Eq. (||) with Um = Umm = 1, and study 
sequences of l engths up to N = 1, 280 . 

From Sees. tlBSJ |lIIA| , and |vbJ, we know that the 
pinch free energy AF(N) depends logarithmically on the 
sequence length N at both low and high temperatures. 
Indeed, this logarithmic behavior seems to hold for all 
temperatures studied. The data for each temperature 
can easily be fitted to the form 



AF{N) =a{T)liiN + c{T) 



The prefactor a{T) is found to depend on temperature 
in a non-monotonic way as shown in Fig. O. The figure 
contains values of a{T) extracted by fitting the data for 
A'' > 160 to the form Eq. (E^). The uncertainty of this 
fit is on the order of the size of the symbols or smaller. 
For high temperatures, we find a{T) « ffesT (dashed 
line in Fig. |l^) as expected for the molten phase. At low 
temperatures, it starts from a finite value of the order 
1 and decreases linearly with ternperature, as a{T) « 
0.97-2.7kBT (dotted line in Fig. |l2|). If we identify the 
glass transition temperature Tg as the intersection of the 
dashed and the dotted lines, we get 



fcsTg w 0.25. 



(50) 



It is interesting to compare this estimate with the lower 
bou nd T* for the glass transition temperature given in 
Sec. [V A . According to the consistency condition (45), 
this lower bound is defined by 



A" 



2/o(r*)] - -fcsT* 



(51) 



with A = ln2. It is necessary to determine the tem- 
perature dependence of the quantity w,„ -I- 2/o(T) on the 
left hand side of this equation numerically. To do this, we 
measure the total free energy of each sequence generated. 
Averaging these free energies over all the sequences of a 
given length N and temperature T, and dividing the re- 
sults by the respective lengths N, we obtain an estimate 
fo{T; N) of the free energy per length which approaches 
the desired foiT) for large N. 

Fig. |l^ shows how these estimates depend on the se- 
quence length N for the lowest temperature ksT — 0.025 
studied. Instead of the free energy per length itself, 
the figure shows the fraction of unbound bases (t{N) — 
Urn + 2/o(T « 0; A^). For short sequences these estimates 
show a clear dependence on the sequence length A^. This 
can be understood in terms of sequence-to-sequence fluc- 
tuations in the maximum number of possible pairings, 
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FIG. 12: Prefactor a(T) of the logarithmic dependence of 
A_F on N for random RNA sequences generated by the se- 
quence disorder model: At high temperatures, the prefactor 
indicated by the circle is well-described by the dashed line 
^ksT expected of the molten phase. At low temperatures, it 
again has a linear temperature dependence and is empirically 
fitted by the dotted line, 0.97 — 2.7fcsT. The numerical un- 
certainty in a{T) is of the order of or smaller than the size of 
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FIG. 13: Sequence length dependence of the fraction of un- 
bound bases a{N) = Um + 2/o(r « 0; iV): The data shown 
is taken at ksT = 0.025. For small N it is dominated by 
the statistical fluctuations in the number of bases according 
to Eq. (B2|) (the dashed line). At large N ^ it saturates to a 
positive constant. 



due to fluctuations in the actual number of each type of 
bases present in a given sequence, even if all four bases are 
drawn with equal probability. This effect can be quan- 
tified by assuming that there is no frustration for small 
A^, i.e., for any given sequence of the four bases A, C, G, 
and [/, a secondary structure with the maximal number 
of Watson-Crick base pairs can be formed. If we denote 



by nx the number of times that the base X appears in 
the sequence, the maximal number M of pairings is given 
by M = imn{nA,nir} + min{nG,nc}. The fraction of 
unbound bases 1 — 2M/N due to this effect can be com- 
puted straightforwardly approximating the multinomial 
distribution of ua — nu by an appropriate Gaussian dis- 
tribution, with the result 



1 



2M 

IT 



(52) 



We expect this effect to be responsible for the increase in 
a{N) found in Fig. |l^. Indeed this effect, as indicated by 
the dashed line in the figure, explains the N dependence 
of <t{N) well for N < 100. However, we also see from the 
figure a clear saturation effect at large N. This satura- 
tion reflects the finite fraction of unbound bases which is 
a frustration effect forced upon the system through the 
restriction on the type of allowed pairings in a allowed 
secondary structure. The unbound fractio n a » 0.08 
is finite asymptotically as expected in Sec. IV A, Note 



that this value is remarkably small, as it implies that in 
the ground state structure of our toy random sequence, 
more than 90% of the maximally possible base pairs are 
formed. But this is an artifact of the very simple en- 
ergy rule used in our toy model. This fraction certainly 
will become smaller if the realistic energy rules are used, 
making the system more frustrated hence more glassy. 

In order to obtain the temperature dependence of the 
quantity Uai + 2/o(r) on the left hand side of Eq. ([sil), 
we will use its value at iV = 1,280 as an estimate of 
its asymptotic value. The results are shown in Fig. |lj. 
The behavior at low temperatures can be described by 
a linearly decreasing function, shown as the dotted line 
in Fig. |l4| According to Eq. (pl|), the temperature T* 
is obtained as the intersection of this curve and lAfcsT, 
shown as the dashed line in Fig. |lj for A = In 2. We find 



knT* 



0.066 



(53) 



which is consistent with the estimate (|5^), but is a rather 
weak bound. Improved bounds on Tg can be made by re- 
laxing the condition of perfec t com plementarity of the 
two segments imposed in Sec. [VA. This leads to larger 
values of the prefactor A 




511), hence a smaller 
and a larger value 



in E. 
slope for the dashed line in Fig. |14 
of T*. While the details of improved bounds will be dis- 
cussed elsewhere, let us remark here that from Fig. |lj it 
is clear that no matter what the slope of the dashed line 
becomes, we will never have T* larger than the tempera- 
ture of ksT w 0.22 where the quantity Um -I- 2/o(T) goes 
below zero. Thus, these estimates will always be con- 
sistent with the observed glass transition temperature of 
ksTg « 0.25. 

Moreover, we note that the low temperature behav- 
ior Mm + 2/o(T) « 0.089 - O.SlfcsT as indicated by the 
dotted line in Fig. O, appears to be roughly related to 
the behavior of a(T)( dotted line in Fig. |l^ in the same 
temperature range, by a single scaling factor of approxi- 
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FIG. 14: Estimation of T*: The symbols show numerical esti- 
mates of the quantity u-^ + 2/o (T) for different temperatures. 
The estimates are obtained by averaging the numerically de- 
termined free energy of 1,000 random sequences of length 
A'^ = 1, 280 generated by the sequence disorder model Eq. (2) 
with Mm = Wmm = 1. The low temperature behavior can be 
described reasonably well by the expression 0.089 — O.SlfcBT 
(dotted hne). The consistency condition (WSl) for the molten 
phase breaks down when this line intersects | ln(2)fcsr (the 
dashed line). This yields fcsT* « 0.066 as a lower bound for 
the glass transition temperature of this system. 



mately 0.1. Thus, it is possible that 

a(T) «A-iK + 2/o(r)] (54) 

if it turns out that A^^ w 0.1 for T < T^. If this is the 
case, then it means th e pro cedure we used to estimate the 
pinch energy in Sec. IV A is quantitatively correct, im- 



plying that the ground state of a random RNA sequence 
indeed consists of the matching of rare segments indepen- 
dently at each length scale. It will be useful to pursue this 
analysis further using a renormalization group approach 
similar to what was developed for the denaturation of 
heterogeneous DNAs by Tang and Chate [B8| . 



V. SUMMARY AND OUTLOOK 

In this manuscript, we studied the statistical proper- 
ties of random RNA sequences far below the denatura- 
tion transition so that bases predominantly form base 
pairs. We introduced several toy energy models which 
allowed us to perform detailed analytical and numerical 
studies. Through a two-replica calculation, we show that 
sequence disorder is perturbatively irrelevant, i.e., an 
RNA molecule with weak sequence disorder is in a molten 
phase where many secondary structures with comparable 
total energy coexist. A numerical study of the model at 
high temperature recovers scaling behaviors characteris- 
tic of the molten phase. At very low temperatures, a 
scaling argument based on the extremal statistics of rare 
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matches suggest the existence of a different phase. This 
is supported by extensive numerical results: Forced de- 
formations are introduced by pinching distant monomers 
along the backbone; the resulting excitation energies are 
found to grow very slowly (i.e., logarithmically) with the 
deformation size. It is likely that the low temperature 
phase is a marginal glass phase. The intermediate tem- 
perature range is also studied numerically. The tran- 
sition between the low temperature glass phase and the 
high temperature molten phase is revealed by a change in 
the coefficient of the logarithmic excitation energy, from 
being disorder dominated to entropy dominated. 

From a theoretical perspective, it would be desirable 
to find an analytical characterization of the low tempera- 
ture phase. If the excitation energy indeed diverges only 
logarithmically, one has the hope that this may be pos- 
sible, e.g., via the replica theory, as was done for an- 
other well known model of statistical physics |Q. It 
should also be interesting to include the spatial degrees 
of freedom of the polymer backbone (via the logarithmic 
loop energy), to see how sequence disorder affects the 
denaturation transition. Another direction is to include 
sequence design which biases a specific secondary struc- 
ture, e.g., a stem- loop pq |. From a numerical point of 
view, it is necessary to perform simulations with realis- 
tic energy parameters to assess the relevant temperature 
regimes and length scales where the glassy effect takes 
hold. To make potential contact with biology, one needs 
to find out whether a molten phase indeed exists between 
the high temperature denatured phase and the low tem- 
perature glass phase for a real random RNA molecule, 
and which phase the molecule is in under normal phys- 
iological condition. Finally, it will be very important 
to perform kinetic studies to explore the dynamical as- 
pects of the glass phase. Despite the apparent weakness 
of the thermodynamic glassiness, the kinetics at biolog- 
ically relevant temperatures is expected to be very slow 
for random sequences [p9| . 



APPENDIX A: HEURISTIC DERIVATION OF 
THE TWO-REPLICA PHASE TRANSITION 

Before we describe the exact solution for the two- 
replica problem, as defined by the partition function Q 
in Eq. (H) and the bubble weight Q in Eq. (|^, we 
first provide here a heuristic derivation of the qualitative 
results. This mainly serves to give a fiavor of the two- 
replica problem in the language of theoretical physics. 

To this end, we define the quantity n(A'^) to be the 
partition function over all two replica configurations of a 
sequence of length A^ — 1 under the constraint that base 
1 and base N —1 form a common bond. It is easy to see 
that 



n{N)=qg{N-2-q) 



where we set 



q = q^q. 



(Al) 



(A2) 



Thus, the critical behavior of g{N\ q) is identical to the 
critical behavior of n(A^) which we will study in the fol- 
lowing. 

Due to the no pseudo knot constraint of the secondary 
structures, n(7V) has a very simple structure, 

U{N + l) = qQ{N-l) (A3) 

+ ^ qQ(^i+£2+^3 + l)n(ni + l)x 

XlV{n2 + l)5l^+l^+l^+ni+n2,N -2+ ■ ■ ■ 

as illustrated in Fig. hS. To simplify the above equation. 
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FIG. 15: Recursion equation for the restricted partition func- 
tion n(A'^ -I- 1): While the first and last of the A'' bases de- 
scribed by n(A + 1) always form a base pair this outermost 
base pair can be followed by a loop with 0, 1, 2, . . . outgoing 
stems. Each of the stems is described by 11 itself while the 
loop is characterized by its total length that can be split into 
the different pieces li in different ways. 



it is useful to introduce the z-transforms 

oo 



iV=l 



Q[y\ = E 2(^)^" 



yN 



N=l 
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of n an d Q . Now applying the z-transform to both sides 
of Eq. ( |A3| ), we obtain 

me''^^qf^Q[y]iC[y,^,]\l+Z[y,f,]me^^e-y (A4) 



singularity disappears and 11 [/i] is governed by the sin- 



-{jC[y,fi]n[^i] 



e^^e.-y\ + 



where 



^[y,/^] 



1 



g-y+M 



(A5) 



and the inverse transform Q,(V) = / 2^Q[y]e^^ was used. 
Eq. (A4) can be simplified greatly to the following 
form, 



n[/x]e 



2/j. 



Ay 



Q\v\ 



27ri/C-i[y,/^]-n[/z]e2A'e-w 



(A6) 



This is reminiscent of the well-known Hartree solution to 
the 0*-theory, or equivalently the self-consistent treat- 
ment of the self-interacting polymer problem [Q , if we 
identify q as the interaction parameter, /C[y,/i] as the 
"propagator" . The usual form of the Hartree equation 



n[/i] 



d^A: 



1 



(2txY fc2 + ^_n[Ai] 



(A7) 



corresponds to the small-y, small- /i limit of Eq. ( [A6| ), 
with —y playing the role of the square of the "wave num- 
ber" k. Note that Q[j/] plays the role of the density of 
(spatial) states, i.e. dyQ\ij\ = ^^,^ , where d denotes the 
dimensionality of the "embedding space" . 

In the context of RNA, de Gennes used this approach 
to describe the denaturation of uniformly attracting RNA 
more than 30 years ago ||lj] . Recently, this approach has 
been extended by Moroz and Hwa to study the phase 
diagram of RNA structure formation |gl[. The analy- 
sis of a self-consistent equation of the type (A6) is well 
known pl^, Eoj. The analytical properties of n[/i] depend 
crucially on the form of Q[y]. Let the singular part of Q 
be 



Qsing oc (y - yc)"~ 



(A8) 



where j/c is the position of the singularity of Q[y]. [Note 
that a = d/2 by comparing the forms of the Hartree 
equations (M) and (|A7|).] For 1 < a < 2, there is only 
one solution for all q>0, with a square root singularity 
in H[/i] at some finite value of fi. For a > 2, there are 
two possible solutions depending on the value of q. The 
square-root singularity exists for q exceeding some criti- 
cal value^ qciq) > 0, while for q < qdq), the square-root 



"^ Note that the critical value qc{q) depends through Q on g but 
not on q. 



gularity of Q given in Eq. (A8). Performing the inverse 
transform and using Eq. (Al), we get G{N; q) ~ M~^C,^ 



where C is a non-universal parameter given by the loca- 
tion of the singularity, while the exponent 9 characterizes 
the phase of the system and is given by the singularity 
of H[/i]: We have 9 — 3/2 if H[/i] is dominated by the 
square-root singularity and 9 — a ii Il[fi\ is dominated 
byQ. 

The interpretation of the two phases with 9 — 3/2 
and 9 = a are straightforward: The phase with 9 — 
3/2 describes the usual RNA secondary structure (see 
Eq. (|2^)); here the bubbles described by Q are irrelevant. 
In the other phase, the result that 9 — a indicates that 
base pairing is not relevant and the system behaves as a 
single bubble. In the context of the original two-replica 
problem, the irrelevancy of the bubbles in the 9 = 3/2 
phase indicates that the two replicas are locked together, 
behaving as a single replica in this phase. In the other 
phase, the attraction of the common bonds is irrelevant, 
and the two replicas become independent of each other. 



As explained in Sec. IIIB, the purpose of the two- 
replica calculation is to determine whether the inter- 
replica attraction, characterized by q here, is irrelevant, 
i.e., whether the system will not yet be in the 9 = 3/2 
phase for a value of q > 9^ • 1- This is only possi- 
ble if qdq) > 0. From the solution of the problem 
described above, this depends crucially on the singu- 
larity of Q, specifically, on whether a > 2. The dif- 
ficulty in ascertaining the form of Q lies in the no- 
common-bond constraint (i.e., ^i D 5*2 = 0) in the def- 
inition of Q (|30|). However, we note that for q = 1, 
the two replica partition function G{N; q = 1) is simply 
the square of the single replica partition function G{N). 
Thus, g{N;q = 1) = G^{N) ~ N-^'>°z^^{q) according 
to Eq. (pq). Since we just convinced ourselves that 9 can 
take on only two possible values, namely a and 3/2 and 
since 26*0 = i y^ 3/2 we conclude a = 29q = 3 > 2 and 
moreover q^ < qdq)- Thus, we do expect the phase tran- 
sition to occur at qdq) > 0. However, it is not clear from 
this calculation if the system at q = 1 (or q — q^) is ex- 
actly at or strictly below the phase transition point. We 
leave it to the exact solution of the two replica problem 
presented in the next two appendices to establish that 
q = q^ is indeed strictly below the phase transition point 
and that therefore disorder is perturbatively irrelevant. 

We note that in the context of the i/i'^-theory or the 
self-consistent treatment of the self-interacting polymer, 
the result a = 3 implies that the embedding spatial di- 
mension is d = 6. Thus, the two-replica problem corre- 
sponds to the denaturation of a single RNA in 6 spatial 
dimensions. The bubbles Q's of Fig. (^) which originate 
from the branching entropy of the individual RNAs play 
the role of spatial configurational entropy of the single- 
stranded RNA in the denaturation problem. 



20 



APPENDIX B: SOLUTION OF THE 
TWO-REPLICA PROBLEM 

In this appendix, we present the exact solution of 
the two replica problem. While most of the details are 
given here, the most laborious part is further relegated 
to App. ^. 



1. An implicit equation for the two-replica problem 

We start by introducing an auxiliary quantity 
W(A^, n;q). This is a restricted two-replica partition 
function, summing over all independent secondary struc- 
tures of a pair of RNAs of length A^ — 1 bases in which 
there are exactly n—1 exterior bases of the common bond 
structure^ all of which are completely unbound in both 
replicas. Since the exterior bases form one of the bub- 
bles of the common bond structure, the possible binding 
configurations of these exterior bases are described by 
Q{n). Thus, the full partition function of the two replica 
problem can be calculated from this restricted partition 
function as 



N 



g{N;q) = Y^W{N,n;q)Q{n) 



(Bl) 



n=l 



Now, let us formulate a recursion relation for W by 
adding one additional base TV to each of the two RNAs. 
We can separate the possible configurations of the new 
function 'W{N + 1, n; q) according to the possibilities that 
the new base N is either not involved in a common bond 
or forms a common bond with base 1 < i < N . This 
yields the recursion relation 



>V(iV-f l,n;5)= (B2) 

N-l 
= W{N,n~ \;q) + q^qY,W{i,n;q)g{N - i-q) 



for N > 2 and n > I. The applicable boundary con- 
ditions are: W{N,N ^ l;g) = 0, W{N,N;q) = 1, and 
>V(A^, n; g) = for each n>iV>l and >V(iV,0; g) = ^o- 
At this point, it is convenient to introduce the z- 
transforms in order to decouple the discrete convolution 
in Eq. (||). They are 



g{z;q)=J2GiN;q), 



-N 



N=l 



Qiz) = J2 Q{N)z-^, and 



JV=1 



An exterior base of a secondary structure is a base that could be 
bound to a fictitious base at position N + 1 without disrespecting 
the no pseudo-knot constraint. 



Wiz,n;q)= Y, yV{N,n;q)z-^^J2 ^iN,n;q)z- 

N=l N=n 

Using Eq. ( [B^ ) and the boundary conditions we get 

zW{z,n;q) = 

oo 



N 



N^ 



''+ J2 W(A^+l,n;q); 



-N 



N=n+1 



.-(»- 



'^+ J2 ^iN,n^l;q), 



-N 



N=n+1 
oc oo 



+i'^Y. Ew(*'";9)2~*a(A^-*;9)^ 



-(N-i) 



AT— n+1 i—n 



^ W{z,n-l;q)+q^qWiz,n;q)g{z-q). 
This can be solved for yV{z,n;q) with the result 



yV{z,n;q)^ 



1 



z-(i'^qg{z;q) 



W{z,n-l;q). (B3) 



Together with the boundary condition yV{z, 0;q) — 1, we 
get 

W{z,n;q)^[z-q'qg{z;q)y''. (B4) 

If we now multiply Eq. ( |Bl| ) by z^^ and sum both sides 
over N we get 



6^(z;g)-^W(z,n;5)Q(r 



(B5) 



which, upon inserting Eq. (B4), becomes an implicit 
equation 



g{z;q) = Q{z-q'qg{z;q)) 



(B6) 



for the full partition function g{z;q), provided that we 
know the function Q. 

Since Q does not depend on g, we can find its form us- 
ing the following strategy: If g = 1, a common bond does 
not contribute any additional Boltzmann factor. Thus, 
the two replica partition function for this specific value 
of q is just the square of the partition function of a single 
uniformly attracting RNA molecule, i.e., 

giN;q^l)=G^{N). (B7) 

Since we know G{N) through the exact expression ( [2^ ) 
for its ^-transform G, we can regard g{z;q = 1) as a 
known function, even though a closed form expression is 
not available. From Eq. (B6), we have 



g{z;q=l) = Q{z-q^g{z;q=l)). 



(B8) 



This is an equation for Q in terms of the known function 
g{z]q — 1). After we solve it for Q below, we can use 



Eq. (B6) to solve for the only leftover unknown g{z;q) 
for arbitrary values of q. 
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2. Solution in the thermodynamic limit 

In the thermodynamic hmit, it is sufficient to consider 
only the singularities of the z-transform Q(z]q). From 
the form of Q in the vicinity of the singularity C{q)i the 
two-replica partition function G{N; q) is readily obtained 
by the inverse z-transform, with the result 



g{N-q) = A{q)N-"Ci^ 



(B9) 



The result immediately yields the free energy per length, 
/ — —ksT ln(. More significantly, the exponent 9 re- 
veals which phase the two-replica system is in: for g = 1 
(i.e. no disorder), the two-replica system is just a product 
of two independent single-replica systems and we must 
have 9 = 3 as implied by the single-replica partition func- 
tion G(iV) inEq. (p3|). On the other hand, forg ^ oo, the 
two replicas are forced to be locked together and behave 
as a single replica. In this case, we must have 9 = 3/2. 
As we will see, 9 — 3 and 9 = 3/2 are the only values 
this exponent can take on for this system; it indicates 
whether or not the two replicas are locked, and hence 
whether or not the effect of disorder is relevant. 

Th e s ingularity C(g) of Q{z;q) is given implicitly by 
Eq. (B6), which we now analyze in detail. We start by 
recalling the solution of the homogeneous single RNA 
problem, Eq. (Eq). From the relation (B7), we have 



g{N,q = 1) = A^{q)N-^'^oz^N foj. i^rge N, wi th 6*0 
3/2, zq = 1 + 2y/q, and ^0(9) given in Sec. II B 2. Hence, 

the 2-transform Q{z; q = 1) = ^^ G{N; q)z^^ is defined 
on the interval [zq,oo[. It is a monotonously decreasing 
function of z, terminating with a singularity at z — Zq 
which produces the 9 = 3 singularity in G{N; q = 1). 

Due to Eq. (pq), the same singularity must occur in 
Q{z) at z = ({0} = Zq — q^gi, where 



51 



G{zo'^<l 



JV=1 



G{Nfz, 



■2N 



(BIO) 



is a positive number and does not depend on any- 
thing else but q. Since z — q^Q{z;q = 1) is a smooth 
monotonously increasing function which maps the in- 
terval [zq,cx)[ into the interval [C(0),oo[, it follows from 
Eq. ( p^ ) that Q,{z) is a smooth, monotonously decreas- 
ing function which maps the interval [C(0),oo[ into the 
interval ]0, gi]. 

Now that we have characterized Q{z) in detail, we can 
proceed to study Q[z] q) for arbitrary q. Clearly, accord- 
ing to Eq. ( p3q ) Q{z;q) has a singularity leading to 6* = 3 
at z = zi (q) , defined implicitly by 



Zl{q) ~ q^Q{zi{q):q) ^ m 
because Q(z) has this singularity at ; 



(Bll) 

C(0). Again 

according to Eq. (B6), we have CJ(zi(g);g) = Q(C(0)) = 
gi independent of q. This leads to one of the key results 



If z = Zi{'q) is the only singularity oiQ{z\q), it would 
imply that there is only one phase with 9 = 3, and the 
free energy per length of the two-replica system is given 
by 



h = -kBT\n[{l + 2^f + q^{q- l)g,] 



(B13) 



for all values of q. By differentiating this with respect to 
q, we obtain the fraction of common contacts 



Sl 



q^qgi 



(1 + 2^9)^ 



q'^qgi 



(B14) 



in this phase as a function of q. For very large disorder, 
i.e., for large q, this fraction converges to one. However, 
since it is the fraction of bonds divided by the total num- 
ber of bases and every base pair has two bases, it has to 
be b ound ed from above by 1/2. Thus, we conclude that 
Eq. ( |B13| ) cannot be the free energy of the two-replica 
system for all (ps. At least for large q, there must be 
another singularity of Q{z;q) which will yield a different 
expression for the free energy to give physically reason- 
able fraction of common bonds. 

In order to find this other singularity, we introduce 
the inverse function Z{g] q) of Q{z;q). From Eq. (B6), it 
follows that for any q and any g €]0, gi], 

g{Z{g;q)-q) = Q[Z{g-q)-q^qg{Z{g-q)-q)] 

^ g = Q[Z{g-q) ~ q^qg] 
=> Q-\g) = Z{g-q)^q'qg. 

Since Q~^{g) does not depend on q, we can eliminate it 
by evaluating the last equation above at the special value 
q = 1 and write 



Zig;q) = Z{g;l) + q^iq-l)g, 



(B15) 



ziiq) = m- 



' q^qgi 



{l + 2^f + q\q-l)g^. (B12) 



where Z{g] 1) is the inverse of the known function 
Q{z;q = 1). Eq. ( |B15| ) is now an explicit solution for the 
inverse of Q{z] q) for arbitrary q, and the singularity of Z, 
located at 17 = 52(g) and Z{g2,q) = Z2{q), yields the free 
energy of the two-replica system, i.e., /2 = —kBTliiZ2, 
in this phase. 

While App. y derives t he po sition of the dominant 
singularity present in Eq. (B15) rigorously, we will re- 
sort to some intuitive argument here. Since 5(z; 1) is 
a monotonously decreasing, convex function, so is its 
inverse Z{g; 1). The latter function ends at the point 
(zq,5i) with some slope z/ < in a singularity which 
produces the 9 = 3 behavior indicative of two inde- 
pendently fluctuating uniformly self-attracting replicas. 
This is shown in Fig. nfl as the solid line. According 
to Eq. ( |B15| ), we can obtain the corresponding function 
Z{g;q) for arbitrary g by simply adding a linear function 
q"^ {q—l)g to this function. If the slope q^ [q— 1) of this lin- 
ear function is less than the smallest slope of Z{g; 1), i.e., 
if zf + q^{q — 1) < 0, adding this linear function does not 
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qualitatively change anything (see the short dashed line 
in Fig. |6[) The only singularity is still the one at g — gi, 
the corresponding value in z, i.e., zi{q) = Z{gi;q), is triv- 
ially shifted by an amount q^i/j— l)gi from Zq as g varies. 
Thus, the scaling behavior is characterized by = 3 as 
iiq — I (i.e., absence of disorders), although the free en- 
ergy /i — — A:BTlnzi(q) is shifted as already derived in 

Eq. §m. 



the root of the derivative, i.e., by 



^ -. ^ - - q>qc 




g2(q) 



FIG. 16: Inverse of the Laplace transformed partition func- 
tion Z{g;q) of the two replica system at various values of the 
common bond interaction q: The solid line shows the free 
system without any interaction of common bonds. In the 
presence of an interaction, the inverse function of the parti- 
tion function can be obtained by adding a linear function to 
the free system function. If the interaction is not too strong 
{q < qc, short dashed line) adding the linear function with a 
small slope does not change the qualitative form of the parti- 
tion function. In this case, the two replica system is controlled 
by the singularity ai g = gi which is independent of q. At 
stronger interactions (g > Qc, long dashed line) the inverse 
function develops a minimum. Beyond this minimum it is 
not invertible any more and the two replica system is then 
dominated by the singularity arising from this minimum. 



If the fi nal slope z/ + q^(g — 1) of the right hand side of 
Eq. ( B15| ) is positive the situation becomes much differ- 
ent: Upon adding the linear function, Z{g; q) develops a 
minimum at some position (52(9)) -^2(5))- Thus, the in- 
verse function Q{z\q) has to be calculated from the left 
(small g) branch of Z{g;q) and has a square root singu- 
larity at Z2 (q) ■ This square root singularity implies that 
the characteristic exponent becomes 6 = 3/2, consistent 
with the picture that in this phase the two replicas are 
locked together and fluctuate as one single effective RNA 
molecule. 

The position .92(9) of the minimum is determined by 



9'(g-l) 



d 



2(5; 1) = 
s=92(g) 



d 



z=Z{g2(qy,l) 

(B16) 
The corresponding value Z{g2{q)]q) determines the lo- 
cation of the square root singularity 22(9) of Q{z]q), i.e., 
the free energy per length of the two replica problem. 

2:2(9) can be conveniently expressed in terms of 
the auxiliary quantity Zcijl) defined through 32(9) = 
g{zc(q)]\) as 



22(g) 



Z{g2{q);q) 

Z(.92(g);i) + 9^(9-1)52(9) 

2c(9) + 9'(9-l)^(zc(9);l). 



Comparing this expression with Eq. ( B13| ) which is valid 
for small 9, we can summarize the complete solution in 
terms of 



Zc{q)={ 



the unique z ^]zq,oo[ that ful- 9 > 9c 
fils£^(z;l)=-l/[92(9-l)] 



(i + 2Vg)' 



q<qc 



where 



and 



zl 



1-4>1, 



(B17) 
(B18) 



— I 

dz \z=z'i, 



g{z;q = i) 



j:N=iNG{N)^z-'^''~'y 
JB19) 
In terms of this Zc{q), the smallest singularity of -2 (z ; 9) 
is located at 



C(9) = 2c(9) + g'(?-i)5(2c(9);i). 



(B20) 



The free energy per length of the two replica system is 
given by 

/ = -fcsTln[ze(9) + q^{q- l)g{zc{q); 1)] (B21) 

and the fraction of common bonds is 
q^qg{z,{q);l) 



Zc{q)+qHq-l)g{zc{q);l) 



(B22) 



This fraction of common bonds turns out to be continu- 
ous at the phase transition but it exhibits a jump in its 

slope at 9 = 9c. 

In the case 9 < Oc, these simplify to Eqs. ( B13| ) 
and ( B14 ) (or Eqs. ( |34| ) and ( p5|) respectively), with 



Zciq) = Zq independent of 9. The type of singularity of 
the Laplace transformed partition function is the same as 
at 9 = 1, resulting in 9 = 3. For 9 > 9c, we cannot write 
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down a closed form expression for C(g) any more. But i t 
is given implicitly in terms of the solution of Eq. (B17); 
it involves only single replica quantities and can thus be 
evaluated numerically. Moreover, we have seen that the 
dominant singularity is in this regime a square root sin- 
gularity implying 6 = 3/2. 



APPENDIX C: THE FREE ENERGY OF THE 
TWO-REPLICA PROBLEM 

In this appendix we give a derivation of the position 
of the non-trivial singularity in the Laplace transform of 
the partition function Q[z;q). A more intuitive, graphi- 
cal derivation of this result was given in App. ^. Using 
Eq. (B15|), we start by calculating 



dz 



G{z;q) 



d 






-, -1 


d 


^ Z{g;l)+q\q-l) 




" d 
d] 


- ^^ S{z;l) 

z=Z{g{z;q);l) 


-1 



This expression obviously has a singularity at Z2 (q) which 
is defined by 



_d 

dz 



G{z;l) 



1 



z=Zig{z2{q):q);l) 



qHM-^y 



(CI) 



Since j-Giz; 1) G [1/z/, 0[, this is only possible for 



z/ 



q>qc = l- . 



For smaller values of q, there is no other si ngul arity and 
the free energy per length is given by Eq. (B13). 

If 9 > 9c the additional singularity Z2{q) exists and is 
— as we will see below — always smaller than the sin- 
gularity Zi{q). Thus, the free energy per length is given 
by the singularity Z2{(i) in the strong coupling phase, i.e., 
for q> 5c- 

On the first sight, Eq. (|Cl| ) still looks as if Z2{q) could 
only be calculated if the full function G{z] q) is known. 
However, for any q > qc 'Vfe can define Zc{q) to be the 
unique solution of the equation 



d 

d^ 



g{z;i) = - 



1 



z=Za(q} 



q^q-iy 



This quantity depends only on the function g{z] 1). Ac- 
cording to Eq. (CI), Z2{q) and Zc(5) B.re related by 
Z{g{z2{q):,q); 1) = Zc{q). This implies t hat g (z2(q);q) = 
Q{zc{q);l). On the other hand, Eq. ( |B15| ) apphed to 

9 = G{z2{q);q) yields 

Z2{q) = Z{g{z2{q);q);q) 

= Z(g{z2{q):q)-A) + q\q-l)g{z2{q);q) 
= z,{q) + q\q-l)g{zM-A) 

which is finally an expression which involves only quan- 
tities of the non-interacting system. Since z + q^{q — 
l)Q{z]l) — Z{Q{z]l);q) is a monotonous function on 
the interval [zq, Zcicj)], we always have Z2(5) < ziicj) with 
equality if and only if Zc(5) = Zq, i.e., for q = 5c- There- 
fore, the free energy per length is indeed given by 



/2 = -kBT\n[z,{q) + g2(q _ l)g{z^[q)- 1)] 
for any q > 5c- 
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